Finite-sample analysis of least-squares policy iteration

نویسندگان

  • Alessandro Lazaric
  • Mohammad Ghavamzadeh
  • Rémi Munos
چکیده

In this paper, we report a performance bound for the widely used least-squares policy iteration (LSPI) algorithm. We first consider the problem of policy evaluation in reinforcement learning, that is, learning the value function of a fixed policy, using the least-squares temporal-difference (LSTD) learning method, and report finite-sample analysis for this algorithm. To do so, we first derive a bound on the performance of the LSTD solution evaluated at the states generated by the Markov chain and used by the algorithm to learn an estimate of the value function. This result is general in the sense that no assumption is made on the existence of a stationary distribution for the Markov chain. We then derive generalization bounds in the case when the Markov chain possesses a stationary distribution and is β-mixing. Finally, we analyze how the error at each policy evaluation step is propagated through the iterations of a policy iteration method, and derive a performance bound for the LSPI algorithm.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Convergence Proofs of Least Squares Policy Iteration Algorithm for High-Dimensional Infinite Horizon Markov Decision Process Problems

Most of the current theory for dynamic programming algorithms focuses on finite state, finite action Markov decision problems, with a paucity of theory for the convergence of approximation algorithms with continuous states. In this paper we propose a policy iteration algorithm for infinite-horizon Markov decision problems where the state and action spaces are continuous and the expectation cann...

متن کامل

Regularized Policy Iteration

In this paper we consider approximate policy-iteration-based reinforcement learning algorithms. In order to implement a flexible function approximation scheme we propose the use of non-parametric methods with regularization, providing a convenient way to control the complexity of the function approximator. We propose two novel regularized policy iteration algorithms by addingL-regularization to...

متن کامل

Least-squares methods for policy iteration

Approximate reinforcement learning deals with the essential problem of applying reinforcement learning in large and continuous state-action spaces, by using function approximators to represent the solution. This chapter reviews least-squares methods for policy iteration, an important class of algorithms for approximate reinforcement learning. We discuss three techniques for solving the core, po...

متن کامل

Learning an Exercise Policy for American Options from Real Data

We study approaches to learning an exercise policy for American options directly from real data. We investigate an approximate policy iteration method, namely, least squares policy iteration (LSPI), for the problem of pricing American options. We also extend the standard least squares Monte Carlo (LSM) method of Longstaff and Schwartz, by composing sample paths from real data. We test the perfo...

متن کامل

Global least squares solution of matrix equation $sum_{j=1}^s A_jX_jB_j = E$

In this paper, an iterative method is proposed for solving matrix equation $sum_{j=1}^s A_jX_jB_j = E$. This method is based on the global least squares (GL-LSQR) method for solving the linear system of equations with the multiple right hand sides. For applying the GL-LSQR algorithm to solve the above matrix equation, a new linear operator, its adjoint and a new inner product are dened. It is p...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Journal of Machine Learning Research

دوره 13  شماره 

صفحات  -

تاریخ انتشار 2012